Overview

Brought to you by YData

Dataset statistics

Number of variables22
Number of observations3116945
Missing cells15868508
Missing cells (%)23.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.4 GiB
Average record size in memory840.4 B

Variable types

Numeric4
Categorical10
Text8

Alerts

cap-diameter is highly overall correlated with stem-height and 1 other fieldsHigh correlation
class is highly overall correlated with stem-rootHigh correlation
stem-height is highly overall correlated with cap-diameterHigh correlation
stem-root is highly overall correlated with classHigh correlation
stem-width is highly overall correlated with cap-diameterHigh correlation
does-bruise-or-bleed is highly imbalanced (85.7%)Imbalance
gill-spacing is highly imbalanced (80.6%)Imbalance
stem-root is highly imbalanced (66.8%)Imbalance
veil-type is highly imbalanced (99.8%)Imbalance
veil-color is highly imbalanced (69.8%)Imbalance
has-ring is highly imbalanced (82.4%)Imbalance
ring-type is highly imbalanced (79.3%)Imbalance
spore-print-color is highly imbalanced (56.6%)Imbalance
cap-surface has 671023 (21.5%) missing valuesMissing
gill-attachment has 523936 (16.8%) missing valuesMissing
gill-spacing has 1258435 (40.4%) missing valuesMissing
stem-root has 2757023 (88.5%) missing valuesMissing
stem-surface has 1980861 (63.6%) missing valuesMissing
veil-type has 2957493 (94.9%) missing valuesMissing
veil-color has 2740947 (87.9%) missing valuesMissing
ring-type has 128880 (4.1%) missing valuesMissing
spore-print-color has 2849682 (91.4%) missing valuesMissing
id is uniformly distributedUniform
id has unique valuesUnique

Reproduction

Analysis started2024-09-04 09:09:27.286239
Analysis finished2024-09-04 09:12:31.396761
Duration3 minutes and 4.11 seconds
Software versionydata-profiling vv4.9.0
Download configurationconfig.json

Variables

id
Real number (ℝ)

UNIFORM  UNIQUE 

Distinct3116945
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1558472
Minimum0
Maximum3116944
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.8 MiB
2024-09-04T12:12:31.582957image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile155847.2
Q1779236
median1558472
Q32337708
95-th percentile2961096.8
Maximum3116944
Range3116944
Interquartile range (IQR)1558472

Descriptive statistics

Standard deviation899784.66
Coefficient of variation (CV)0.57735055
Kurtosis-1.2
Mean1558472
Median Absolute Deviation (MAD)779236
Skewness-2.5075827 × 10-15
Sum4.8576715 × 1012
Variance8.0961244 × 1011
MonotonicityStrictly increasing
2024-09-04T12:12:31.773888image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1
 
< 0.1%
2077967 1
 
< 0.1%
2077958 1
 
< 0.1%
2077959 1
 
< 0.1%
2077960 1
 
< 0.1%
2077961 1
 
< 0.1%
2077962 1
 
< 0.1%
2077963 1
 
< 0.1%
2077964 1
 
< 0.1%
2077965 1
 
< 0.1%
Other values (3116935) 3116935
> 99.9%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
3116944 1
< 0.1%
3116943 1
< 0.1%
3116942 1
< 0.1%
3116941 1
< 0.1%
3116940 1
< 0.1%
3116939 1
< 0.1%
3116938 1
< 0.1%
3116937 1
< 0.1%
3116936 1
< 0.1%
3116935 1
< 0.1%

class
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size148.6 MiB
p
1705396 
e
1411549 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3116945
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowe
2nd rowp
3rd rowe
4th rowe
5th rowe

Common Values

ValueCountFrequency (%)
p 1705396
54.7%
e 1411549
45.3%

Length

2024-09-04T12:12:31.942560image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-04T12:12:32.073796image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
p 1705396
54.7%
e 1411549
45.3%

Most occurring characters

ValueCountFrequency (%)
p 1705396
54.7%
e 1411549
45.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3116945
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
p 1705396
54.7%
e 1411549
45.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3116945
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
p 1705396
54.7%
e 1411549
45.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3116945
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
p 1705396
54.7%
e 1411549
45.3%

cap-diameter
Real number (ℝ)

HIGH CORRELATION 

Distinct3913
Distinct (%)0.1%
Missing4
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean6.3098484
Minimum0.03
Maximum80.67
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.8 MiB
2024-09-04T12:12:32.322549image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0.03
5-th percentile1.34
Q13.32
median5.75
Q38.24
95-th percentile13.23
Maximum80.67
Range80.64
Interquartile range (IQR)4.92

Descriptive statistics

Standard deviation4.6579305
Coefficient of variation (CV)0.73820007
Kurtosis32.743381
Mean6.3098484
Median Absolute Deviation (MAD)2.46
Skewness3.9726092
Sum19667425
Variance21.696317
MonotonicityNot monotonic
2024-09-04T12:12:32.492196image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.49 8164
 
0.3%
3.18 7942
 
0.3%
3.14 7361
 
0.2%
1.51 7072
 
0.2%
4.04 6828
 
0.2%
3.28 6826
 
0.2%
2.87 6807
 
0.2%
3.85 6642
 
0.2%
3.24 6634
 
0.2%
1.52 6562
 
0.2%
Other values (3903) 3046103
97.7%
ValueCountFrequency (%)
0.03 1
 
< 0.1%
0.1 1
 
< 0.1%
0.3 1
 
< 0.1%
0.38 1
 
< 0.1%
0.4 6
 
< 0.1%
0.41 2
 
< 0.1%
0.42 3
 
< 0.1%
0.44 15
< 0.1%
0.45 2
 
< 0.1%
0.46 5
 
< 0.1%
ValueCountFrequency (%)
80.67 1
< 0.1%
64.46 1
< 0.1%
62.4 1
< 0.1%
62.3 1
< 0.1%
62.06 1
< 0.1%
62.01 1
< 0.1%
60.97 1
< 0.1%
59.76 1
< 0.1%
59.74 2
< 0.1%
59.66 1
< 0.1%
Distinct74
Distinct (%)< 0.1%
Missing40
Missing (%)< 0.1%
Memory size148.6 MiB
2024-09-04T12:12:32.707748image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length9
Median length1
Mean length1.0000536
Min length1

Characters and Unicode

Total characters3117072
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique47 ?
Unique (%)< 0.1%

Sample

1st rowf
2nd rowx
3rd rowf
4th rowf
5th rowx
ValueCountFrequency (%)
x 1436030
46.1%
f 676240
21.7%
s 365147
 
11.7%
b 318647
 
10.2%
o 108835
 
3.5%
p 106968
 
3.4%
c 104520
 
3.4%
d 65
 
< 0.1%
e 60
 
< 0.1%
n 41
 
< 0.1%
Other values (62) 360
 
< 0.1%
2024-09-04T12:12:33.073310image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
x 1436030
46.1%
f 676240
21.7%
s 365149
 
11.7%
b 318647
 
10.2%
o 108835
 
3.5%
p 106969
 
3.4%
c 104520
 
3.4%
d 65
 
< 0.1%
e 61
 
< 0.1%
. 44
 
< 0.1%
Other values (26) 512
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3117072
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
x 1436030
46.1%
f 676240
21.7%
s 365149
 
11.7%
b 318647
 
10.2%
o 108835
 
3.5%
p 106969
 
3.4%
c 104520
 
3.4%
d 65
 
< 0.1%
e 61
 
< 0.1%
. 44
 
< 0.1%
Other values (26) 512
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3117072
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
x 1436030
46.1%
f 676240
21.7%
s 365149
 
11.7%
b 318647
 
10.2%
o 108835
 
3.5%
p 106969
 
3.4%
c 104520
 
3.4%
d 65
 
< 0.1%
e 61
 
< 0.1%
. 44
 
< 0.1%
Other values (26) 512
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3117072
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
x 1436030
46.1%
f 676240
21.7%
s 365149
 
11.7%
b 318647
 
10.2%
o 108835
 
3.5%
p 106969
 
3.4%
c 104520
 
3.4%
d 65
 
< 0.1%
e 61
 
< 0.1%
. 44
 
< 0.1%
Other values (26) 512
 
< 0.1%

cap-surface
Text

MISSING 

Distinct83
Distinct (%)< 0.1%
Missing671023
Missing (%)21.5%
Memory size137.1 MiB
2024-09-04T12:12:33.208551image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001402
Min length1

Characters and Unicode

Total characters2446265
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)< 0.1%

Sample

1st rows
2nd rowh
3rd rows
4th rowy
5th rowl
ValueCountFrequency (%)
t 460779
18.8%
s 384970
15.7%
y 327827
13.4%
h 284463
11.6%
g 263729
10.8%
d 206832
8.5%
k 128876
 
5.3%
e 119712
 
4.9%
i 113440
 
4.6%
w 109840
 
4.5%
Other values (68) 45465
 
1.9%
2024-09-04T12:12:33.507432image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 460785
18.8%
s 385005
15.7%
y 327831
13.4%
h 284466
11.6%
g 263735
10.8%
d 206841
8.5%
k 128876
 
5.3%
e 119741
 
4.9%
i 113454
 
4.6%
w 109840
 
4.5%
Other values (28) 45691
 
1.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2446265
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 460785
18.8%
s 385005
15.7%
y 327831
13.4%
h 284466
11.6%
g 263735
10.8%
d 206841
8.5%
k 128876
 
5.3%
e 119741
 
4.9%
i 113454
 
4.6%
w 109840
 
4.5%
Other values (28) 45691
 
1.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2446265
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 460785
18.8%
s 385005
15.7%
y 327831
13.4%
h 284466
11.6%
g 263735
10.8%
d 206841
8.5%
k 128876
 
5.3%
e 119741
 
4.9%
i 113454
 
4.6%
w 109840
 
4.5%
Other values (28) 45691
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2446265
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 460785
18.8%
s 385005
15.7%
y 327831
13.4%
h 284466
11.6%
g 263735
10.8%
d 206841
8.5%
k 128876
 
5.3%
e 119741
 
4.9%
i 113454
 
4.6%
w 109840
 
4.5%
Other values (28) 45691
 
1.9%
Distinct78
Distinct (%)< 0.1%
Missing12
Missing (%)< 0.1%
Memory size148.6 MiB
2024-09-04T12:12:33.622985image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001011
Min length1

Characters and Unicode

Total characters3117248
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique49 ?
Unique (%)< 0.1%

Sample

1st rowu
2nd rowo
3rd rowb
4th rowg
5th roww
ValueCountFrequency (%)
n 1359544
43.6%
y 386627
 
12.4%
w 379442
 
12.2%
g 210825
 
6.8%
e 197290
 
6.3%
o 178847
 
5.7%
p 91838
 
2.9%
r 78236
 
2.5%
u 73172
 
2.3%
b 61313
 
2.0%
Other values (68) 99801
 
3.2%
2024-09-04T12:12:33.908295image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 1359556
43.6%
y 386633
 
12.4%
w 379442
 
12.2%
g 210831
 
6.8%
e 197314
 
6.3%
o 178860
 
5.7%
p 91844
 
2.9%
r 78248
 
2.5%
u 73175
 
2.3%
b 61317
 
2.0%
Other values (27) 100028
 
3.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3117248
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 1359556
43.6%
y 386633
 
12.4%
w 379442
 
12.2%
g 210831
 
6.8%
e 197314
 
6.3%
o 178860
 
5.7%
p 91844
 
2.9%
r 78248
 
2.5%
u 73175
 
2.3%
b 61317
 
2.0%
Other values (27) 100028
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3117248
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 1359556
43.6%
y 386633
 
12.4%
w 379442
 
12.2%
g 210831
 
6.8%
e 197314
 
6.3%
o 178860
 
5.7%
p 91844
 
2.9%
r 78248
 
2.5%
u 73175
 
2.3%
b 61317
 
2.0%
Other values (27) 100028
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3117248
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 1359556
43.6%
y 386633
 
12.4%
w 379442
 
12.2%
g 210831
 
6.8%
e 197314
 
6.3%
o 178860
 
5.7%
p 91844
 
2.9%
r 78248
 
2.5%
u 73175
 
2.3%
b 61317
 
2.0%
Other values (27) 100028
 
3.2%

does-bruise-or-bleed
Categorical

IMBALANCE 

Distinct26
Distinct (%)< 0.1%
Missing8
Missing (%)< 0.1%
Memory size148.6 MiB
f
2569743 
t
547085 
w
 
14
c
 
11
h
 
9
Other values (21)
 
75

Length

Max length8
Median length1
Mean length1.0000048
Min length1

Characters and Unicode

Total characters3116952
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowf
2nd rowf
3rd rowf
4th rowf
5th rowf

Common Values

ValueCountFrequency (%)
f 2569743
82.4%
t 547085
 
17.6%
w 14
 
< 0.1%
c 11
 
< 0.1%
h 9
 
< 0.1%
y 7
 
< 0.1%
a 7
 
< 0.1%
b 7
 
< 0.1%
x 7
 
< 0.1%
s 6
 
< 0.1%
Other values (16) 41
 
< 0.1%
(Missing) 8
 
< 0.1%

Length

2024-09-04T12:12:34.072715image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f 2569743
82.4%
t 547085
 
17.6%
w 14
 
< 0.1%
c 11
 
< 0.1%
h 9
 
< 0.1%
y 7
 
< 0.1%
a 7
 
< 0.1%
b 7
 
< 0.1%
x 7
 
< 0.1%
s 6
 
< 0.1%
Other values (16) 41
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f 2569743
82.4%
t 547085
 
17.6%
w 14
 
< 0.1%
c 11
 
< 0.1%
h 10
 
< 0.1%
a 8
 
< 0.1%
y 7
 
< 0.1%
b 7
 
< 0.1%
x 7
 
< 0.1%
s 7
 
< 0.1%
Other values (18) 53
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3116952
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f 2569743
82.4%
t 547085
 
17.6%
w 14
 
< 0.1%
c 11
 
< 0.1%
h 10
 
< 0.1%
a 8
 
< 0.1%
y 7
 
< 0.1%
b 7
 
< 0.1%
x 7
 
< 0.1%
s 7
 
< 0.1%
Other values (18) 53
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3116952
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f 2569743
82.4%
t 547085
 
17.6%
w 14
 
< 0.1%
c 11
 
< 0.1%
h 10
 
< 0.1%
a 8
 
< 0.1%
y 7
 
< 0.1%
b 7
 
< 0.1%
x 7
 
< 0.1%
s 7
 
< 0.1%
Other values (18) 53
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3116952
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f 2569743
82.4%
t 547085
 
17.6%
w 14
 
< 0.1%
c 11
 
< 0.1%
h 10
 
< 0.1%
a 8
 
< 0.1%
y 7
 
< 0.1%
b 7
 
< 0.1%
x 7
 
< 0.1%
s 7
 
< 0.1%
Other values (18) 53
 
< 0.1%

gill-attachment
Text

MISSING 

Distinct78
Distinct (%)< 0.1%
Missing523936
Missing (%)16.8%
Memory size139.6 MiB
2024-09-04T12:12:34.192985image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0000891
Min length1

Characters and Unicode

Total characters2593240
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique53 ?
Unique (%)< 0.1%

Sample

1st rowa
2nd rowa
3rd rowx
4th rows
5th rowd
ValueCountFrequency (%)
a 646035
24.9%
d 589237
22.7%
x 360878
13.9%
e 301858
11.6%
s 295439
11.4%
p 279112
10.8%
f 119956
 
4.6%
c 74
 
< 0.1%
u 56
 
< 0.1%
w 37
 
< 0.1%
Other values (64) 334
 
< 0.1%
2024-09-04T12:12:34.492155image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 646042
24.9%
d 589242
22.7%
x 360878
13.9%
e 301872
11.6%
s 295458
11.4%
p 279113
10.8%
f 119956
 
4.6%
c 74
 
< 0.1%
u 57
 
< 0.1%
. 44
 
< 0.1%
Other values (27) 504
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2593240
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 646042
24.9%
d 589242
22.7%
x 360878
13.9%
e 301872
11.6%
s 295458
11.4%
p 279113
10.8%
f 119956
 
4.6%
c 74
 
< 0.1%
u 57
 
< 0.1%
. 44
 
< 0.1%
Other values (27) 504
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2593240
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 646042
24.9%
d 589242
22.7%
x 360878
13.9%
e 301872
11.6%
s 295458
11.4%
p 279113
10.8%
f 119956
 
4.6%
c 74
 
< 0.1%
u 57
 
< 0.1%
. 44
 
< 0.1%
Other values (27) 504
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2593240
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 646042
24.9%
d 589242
22.7%
x 360878
13.9%
e 301872
11.6%
s 295458
11.4%
p 279113
10.8%
f 119956
 
4.6%
c 74
 
< 0.1%
u 57
 
< 0.1%
. 44
 
< 0.1%
Other values (27) 504
 
< 0.1%

gill-spacing
Categorical

IMBALANCE  MISSING 

Distinct48
Distinct (%)< 0.1%
Missing1258435
Missing (%)40.4%
Memory size155.8 MiB
c
1331054 
d
407932 
f
 
119380
e
 
24
a
 
17
Other values (43)
 
103

Length

Max length11
Median length1
Mean length1.00005
Min length1

Characters and Unicode

Total characters1858603
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)< 0.1%

Sample

1st rowc
2nd rowc
3rd rowc
4th rowc
5th rowc

Common Values

ValueCountFrequency (%)
c 1331054
42.7%
d 407932
 
13.1%
f 119380
 
3.8%
e 24
 
< 0.1%
a 17
 
< 0.1%
s 16
 
< 0.1%
b 12
 
< 0.1%
t 8
 
< 0.1%
x 8
 
< 0.1%
p 7
 
< 0.1%
Other values (38) 52
 
< 0.1%
(Missing) 1258435
40.4%

Length

2024-09-04T12:12:34.657831image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c 1331054
71.6%
d 407932
 
21.9%
f 119381
 
6.4%
e 24
 
< 0.1%
a 17
 
< 0.1%
s 16
 
< 0.1%
b 12
 
< 0.1%
t 8
 
< 0.1%
x 8
 
< 0.1%
p 7
 
< 0.1%
Other values (38) 52
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
c 1331057
71.6%
d 407933
 
21.9%
f 119382
 
6.4%
e 26
 
< 0.1%
. 25
 
< 0.1%
s 20
 
< 0.1%
a 20
 
< 0.1%
b 12
 
< 0.1%
2 10
 
< 0.1%
3 10
 
< 0.1%
Other values (24) 108
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1858603
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
c 1331057
71.6%
d 407933
 
21.9%
f 119382
 
6.4%
e 26
 
< 0.1%
. 25
 
< 0.1%
s 20
 
< 0.1%
a 20
 
< 0.1%
b 12
 
< 0.1%
2 10
 
< 0.1%
3 10
 
< 0.1%
Other values (24) 108
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1858603
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
c 1331057
71.6%
d 407933
 
21.9%
f 119382
 
6.4%
e 26
 
< 0.1%
. 25
 
< 0.1%
s 20
 
< 0.1%
a 20
 
< 0.1%
b 12
 
< 0.1%
2 10
 
< 0.1%
3 10
 
< 0.1%
Other values (24) 108
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1858603
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
c 1331057
71.6%
d 407933
 
21.9%
f 119382
 
6.4%
e 26
 
< 0.1%
. 25
 
< 0.1%
s 20
 
< 0.1%
a 20
 
< 0.1%
b 12
 
< 0.1%
2 10
 
< 0.1%
3 10
 
< 0.1%
Other values (24) 108
 
< 0.1%
Distinct63
Distinct (%)< 0.1%
Missing57
Missing (%)< 0.1%
Memory size148.6 MiB
2024-09-04T12:12:34.743226image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001078
Min length1

Characters and Unicode

Total characters3117224
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)< 0.1%

Sample

1st roww
2nd rown
3rd roww
4th rowg
5th roww
ValueCountFrequency (%)
w 931539
29.9%
n 543387
17.4%
y 469466
15.1%
p 343626
 
11.0%
g 212164
 
6.8%
o 157119
 
5.0%
k 127970
 
4.1%
f 119694
 
3.8%
r 62799
 
2.0%
e 56048
 
1.8%
Other values (51) 93080
 
3.0%
2024-09-04T12:12:35.040687image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
w 931539
29.9%
n 543409
17.4%
y 469472
15.1%
p 343642
 
11.0%
g 212176
 
6.8%
o 157141
 
5.0%
k 127970
 
4.1%
f 119694
 
3.8%
r 62819
 
2.0%
e 56072
 
1.8%
Other values (27) 93290
 
3.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3117224
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w 931539
29.9%
n 543409
17.4%
y 469472
15.1%
p 343642
 
11.0%
g 212176
 
6.8%
o 157141
 
5.0%
k 127970
 
4.1%
f 119694
 
3.8%
r 62819
 
2.0%
e 56072
 
1.8%
Other values (27) 93290
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3117224
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w 931539
29.9%
n 543409
17.4%
y 469472
15.1%
p 343642
 
11.0%
g 212176
 
6.8%
o 157141
 
5.0%
k 127970
 
4.1%
f 119694
 
3.8%
r 62819
 
2.0%
e 56072
 
1.8%
Other values (27) 93290
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3117224
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w 931539
29.9%
n 543409
17.4%
y 469472
15.1%
p 343642
 
11.0%
g 212176
 
6.8%
o 157141
 
5.0%
k 127970
 
4.1%
f 119694
 
3.8%
r 62819
 
2.0%
e 56072
 
1.8%
Other values (27) 93290
 
3.0%

stem-height
Real number (ℝ)

HIGH CORRELATION 

Distinct2749
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.3483333
Minimum0
Maximum88.72
Zeros554
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.8 MiB
2024-09-04T12:12:35.204111image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.16
Q14.67
median5.88
Q37.41
95-th percentile11.2
Maximum88.72
Range88.72
Interquartile range (IQR)2.74

Descriptive statistics

Standard deviation2.6997548
Coefficient of variation (CV)0.42526985
Kurtosis7.7615498
Mean6.3483333
Median Absolute Deviation (MAD)1.33
Skewness1.9266817
Sum19787406
Variance7.288676
MonotonicityNot monotonic
2024-09-04T12:12:35.372852image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.24 12332
 
0.4%
5.92 11821
 
0.4%
5.32 10988
 
0.4%
5.35 10431
 
0.3%
5.99 10402
 
0.3%
6.03 10271
 
0.3%
5.54 10265
 
0.3%
5.77 10153
 
0.3%
4.27 10080
 
0.3%
5.65 9994
 
0.3%
Other values (2739) 3010208
96.6%
ValueCountFrequency (%)
0 554
< 0.1%
0.74 1
 
< 0.1%
0.77 1
 
< 0.1%
0.91 1
 
< 0.1%
0.93 1
 
< 0.1%
0.97 2
 
< 0.1%
0.98 1
 
< 0.1%
1 1
 
< 0.1%
1.01 1
 
< 0.1%
1.03 1
 
< 0.1%
ValueCountFrequency (%)
88.72 1
< 0.1%
57.22 1
< 0.1%
53.93 1
< 0.1%
53.87 1
< 0.1%
53.82 1
< 0.1%
53.03 1
< 0.1%
51.41 1
< 0.1%
50.78 1
< 0.1%
50.27 1
< 0.1%
49.37 1
< 0.1%

stem-width
Real number (ℝ)

HIGH CORRELATION 

Distinct5836
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.153785
Minimum0
Maximum102.9
Zeros497
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.8 MiB
2024-09-04T12:12:35.521622image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.58
Q14.97
median9.65
Q315.63
95-th percentile26.49
Maximum102.9
Range102.9
Interquartile range (IQR)10.66

Descriptive statistics

Standard deviation8.0954773
Coefficient of variation (CV)0.72580538
Kurtosis2.4489761
Mean11.153785
Median Absolute Deviation (MAD)5.24
Skewness1.2354271
Sum34765735
Variance65.536753
MonotonicityNot monotonic
2024-09-04T12:12:35.690231image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.41 7829
 
0.3%
2.45 7353
 
0.2%
2.49 7087
 
0.2%
2.56 6824
 
0.2%
2.47 6709
 
0.2%
2.52 6660
 
0.2%
2.51 6535
 
0.2%
2.64 6467
 
0.2%
2.6 6366
 
0.2%
2.61 6117
 
0.2%
Other values (5826) 3048998
97.8%
ValueCountFrequency (%)
0 497
< 0.1%
0.44 2
 
< 0.1%
0.48 2
 
< 0.1%
0.49 1
 
< 0.1%
0.5 3
 
< 0.1%
0.51 1
 
< 0.1%
0.52 21
 
< 0.1%
0.53 16
 
< 0.1%
0.54 11
 
< 0.1%
0.55 12
 
< 0.1%
ValueCountFrequency (%)
102.9 1
 
< 0.1%
102.48 6
< 0.1%
101.69 3
< 0.1%
98 2
 
< 0.1%
94.24 3
< 0.1%
94.05 1
 
< 0.1%
92.51 1
 
< 0.1%
91.91 1
 
< 0.1%
91 1
 
< 0.1%
89.45 2
 
< 0.1%

stem-root
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct38
Distinct (%)< 0.1%
Missing2757023
Missing (%)88.5%
Memory size164.4 MiB
b
165801 
s
116946 
r
47803 
c
28592 
f
 
597
Other values (33)
 
183

Length

Max length17
Median length1
Mean length1.0001778
Min length1

Characters and Unicode

Total characters359986
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)< 0.1%

Sample

1st rowb
2nd rowb
3rd rowc
4th rowb
5th rowr

Common Values

ValueCountFrequency (%)
b 165801
 
5.3%
s 116946
 
3.8%
r 47803
 
1.5%
c 28592
 
0.9%
f 597
 
< 0.1%
d 24
 
< 0.1%
y 14
 
< 0.1%
g 12
 
< 0.1%
p 12
 
< 0.1%
w 12
 
< 0.1%
Other values (28) 109
 
< 0.1%
(Missing) 2757023
88.5%

Length

2024-09-04T12:12:35.855862image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b 165801
46.1%
s 116946
32.5%
r 47803
 
13.3%
c 28592
 
7.9%
f 597
 
0.2%
d 24
 
< 0.1%
y 14
 
< 0.1%
p 12
 
< 0.1%
w 12
 
< 0.1%
g 12
 
< 0.1%
Other values (28) 109
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
b 165801
46.1%
s 116947
32.5%
r 47806
 
13.3%
c 28593
 
7.9%
f 597
 
0.2%
d 24
 
< 0.1%
p 14
 
< 0.1%
. 14
 
< 0.1%
y 14
 
< 0.1%
g 12
 
< 0.1%
Other values (25) 164
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 359986
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
b 165801
46.1%
s 116947
32.5%
r 47806
 
13.3%
c 28593
 
7.9%
f 597
 
0.2%
d 24
 
< 0.1%
p 14
 
< 0.1%
. 14
 
< 0.1%
y 14
 
< 0.1%
g 12
 
< 0.1%
Other values (25) 164
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 359986
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
b 165801
46.1%
s 116947
32.5%
r 47806
 
13.3%
c 28593
 
7.9%
f 597
 
0.2%
d 24
 
< 0.1%
p 14
 
< 0.1%
. 14
 
< 0.1%
y 14
 
< 0.1%
g 12
 
< 0.1%
Other values (25) 164
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 359986
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
b 165801
46.1%
s 116947
32.5%
r 47806
 
13.3%
c 28593
 
7.9%
f 597
 
0.2%
d 24
 
< 0.1%
p 14
 
< 0.1%
. 14
 
< 0.1%
y 14
 
< 0.1%
g 12
 
< 0.1%
Other values (25) 164
 
< 0.1%

stem-surface
Text

MISSING 

Distinct60
Distinct (%)< 0.1%
Missing1980861
Missing (%)63.6%
Memory size114.6 MiB
2024-09-04T12:12:35.973506image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001919
Min length1

Characters and Unicode

Total characters1136302
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)< 0.1%

Sample

1st rowy
2nd rows
3rd rows
4th rowt
5th rows
ValueCountFrequency (%)
s 327611
28.8%
y 255500
22.5%
i 224346
19.7%
t 147974
13.0%
g 78080
 
6.9%
k 73383
 
6.5%
h 28284
 
2.5%
f 512
 
< 0.1%
w 49
 
< 0.1%
d 48
 
< 0.1%
Other values (50) 300
 
< 0.1%
2024-09-04T12:12:36.241916image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 327634
28.8%
y 255500
22.5%
i 224350
19.7%
t 147975
13.0%
g 78081
 
6.9%
k 73383
 
6.5%
h 28286
 
2.5%
f 512
 
< 0.1%
d 54
 
< 0.1%
e 54
 
< 0.1%
Other values (27) 473
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1136302
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s 327634
28.8%
y 255500
22.5%
i 224350
19.7%
t 147975
13.0%
g 78081
 
6.9%
k 73383
 
6.5%
h 28286
 
2.5%
f 512
 
< 0.1%
d 54
 
< 0.1%
e 54
 
< 0.1%
Other values (27) 473
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1136302
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s 327634
28.8%
y 255500
22.5%
i 224350
19.7%
t 147975
13.0%
g 78081
 
6.9%
k 73383
 
6.5%
h 28286
 
2.5%
f 512
 
< 0.1%
d 54
 
< 0.1%
e 54
 
< 0.1%
Other values (27) 473
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1136302
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s 327634
28.8%
y 255500
22.5%
i 224350
19.7%
t 147975
13.0%
g 78081
 
6.9%
k 73383
 
6.5%
h 28286
 
2.5%
f 512
 
< 0.1%
d 54
 
< 0.1%
e 54
 
< 0.1%
Other values (27) 473
 
< 0.1%
Distinct59
Distinct (%)< 0.1%
Missing38
Missing (%)< 0.1%
Memory size148.6 MiB
2024-09-04T12:12:36.355456image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length17
Median length1
Mean length1.0000539
Min length1

Characters and Unicode

Total characters3117075
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)< 0.1%

Sample

1st roww
2nd rowo
3rd rown
4th roww
5th roww
ValueCountFrequency (%)
w 1196638
38.4%
n 1003466
32.2%
y 373971
 
12.0%
g 132019
 
4.2%
o 111541
 
3.6%
e 103374
 
3.3%
u 67017
 
2.2%
p 54690
 
1.8%
k 33676
 
1.1%
r 22329
 
0.7%
Other values (47) 18189
 
0.6%
2024-09-04T12:12:36.622981image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
w 1196638
38.4%
n 1003471
32.2%
y 373974
 
12.0%
g 132022
 
4.2%
o 111547
 
3.6%
e 103379
 
3.3%
u 67017
 
2.1%
p 54697
 
1.8%
k 33676
 
1.1%
r 22338
 
0.7%
Other values (26) 18316
 
0.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3117075
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w 1196638
38.4%
n 1003471
32.2%
y 373974
 
12.0%
g 132022
 
4.2%
o 111547
 
3.6%
e 103379
 
3.3%
u 67017
 
2.1%
p 54697
 
1.8%
k 33676
 
1.1%
r 22338
 
0.7%
Other values (26) 18316
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3117075
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w 1196638
38.4%
n 1003471
32.2%
y 373974
 
12.0%
g 132022
 
4.2%
o 111547
 
3.6%
e 103379
 
3.3%
u 67017
 
2.1%
p 54697
 
1.8%
k 33676
 
1.1%
r 22338
 
0.7%
Other values (26) 18316
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3117075
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w 1196638
38.4%
n 1003471
32.2%
y 373974
 
12.0%
g 132022
 
4.2%
o 111547
 
3.6%
e 103379
 
3.3%
u 67017
 
2.1%
p 54697
 
1.8%
k 33676
 
1.1%
r 22338
 
0.7%
Other values (26) 18316
 
0.6%

veil-type
Categorical

IMBALANCE  MISSING 

Distinct22
Distinct (%)< 0.1%
Missing2957493
Missing (%)94.9%
Memory size165.6 MiB
u
159373 
w
 
11
a
 
9
e
 
8
f
 
8
Other values (17)
 
43

Length

Max length7
Median length1
Mean length1.0000815
Min length1

Characters and Unicode

Total characters159465
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowu
2nd rowu
3rd rowu
4th rowu
5th rowu

Common Values

ValueCountFrequency (%)
u 159373
 
5.1%
w 11
 
< 0.1%
a 9
 
< 0.1%
e 8
 
< 0.1%
f 8
 
< 0.1%
b 5
 
< 0.1%
c 5
 
< 0.1%
g 4
 
< 0.1%
y 4
 
< 0.1%
k 4
 
< 0.1%
Other values (12) 21
 
< 0.1%
(Missing) 2957493
94.9%

Length

2024-09-04T12:12:36.787606image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
u 159373
99.9%
w 11
 
< 0.1%
a 9
 
< 0.1%
e 8
 
< 0.1%
f 8
 
< 0.1%
b 5
 
< 0.1%
c 5
 
< 0.1%
g 4
 
< 0.1%
y 4
 
< 0.1%
k 4
 
< 0.1%
Other values (13) 22
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
u 159373
99.9%
w 11
 
< 0.1%
a 9
 
< 0.1%
e 9
 
< 0.1%
f 8
 
< 0.1%
b 5
 
< 0.1%
c 5
 
< 0.1%
g 4
 
< 0.1%
y 4
 
< 0.1%
k 4
 
< 0.1%
Other values (18) 33
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 159465
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
u 159373
99.9%
w 11
 
< 0.1%
a 9
 
< 0.1%
e 9
 
< 0.1%
f 8
 
< 0.1%
b 5
 
< 0.1%
c 5
 
< 0.1%
g 4
 
< 0.1%
y 4
 
< 0.1%
k 4
 
< 0.1%
Other values (18) 33
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 159465
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
u 159373
99.9%
w 11
 
< 0.1%
a 9
 
< 0.1%
e 9
 
< 0.1%
f 8
 
< 0.1%
b 5
 
< 0.1%
c 5
 
< 0.1%
g 4
 
< 0.1%
y 4
 
< 0.1%
k 4
 
< 0.1%
Other values (18) 33
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 159465
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
u 159373
99.9%
w 11
 
< 0.1%
a 9
 
< 0.1%
e 9
 
< 0.1%
f 8
 
< 0.1%
b 5
 
< 0.1%
c 5
 
< 0.1%
g 4
 
< 0.1%
y 4
 
< 0.1%
k 4
 
< 0.1%
Other values (18) 33
 
< 0.1%

veil-color
Categorical

IMBALANCE  MISSING 

Distinct24
Distinct (%)< 0.1%
Missing2740947
Missing (%)87.9%
Memory size164.3 MiB
w
279070 
y
30473 
n
30039 
u
 
14026
k
 
13080
Other values (19)
 
9310

Length

Max length4
Median length1
Mean length1.0000239
Min length1

Characters and Unicode

Total characters376007
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rown
2nd roww
3rd roww
4th roww
5th rown

Common Values

ValueCountFrequency (%)
w 279070
 
9.0%
y 30473
 
1.0%
n 30039
 
1.0%
u 14026
 
0.4%
k 13080
 
0.4%
e 9169
 
0.3%
g 30
 
< 0.1%
p 23
 
< 0.1%
r 14
 
< 0.1%
o 13
 
< 0.1%
Other values (14) 61
 
< 0.1%
(Missing) 2740947
87.9%

Length

2024-09-04T12:12:36.937444image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
w 279070
74.2%
y 30473
 
8.1%
n 30039
 
8.0%
u 14026
 
3.7%
k 13080
 
3.5%
e 9169
 
2.4%
g 30
 
< 0.1%
p 23
 
< 0.1%
r 14
 
< 0.1%
o 13
 
< 0.1%
Other values (14) 61
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
w 279070
74.2%
y 30473
 
8.1%
n 30039
 
8.0%
u 14026
 
3.7%
k 13080
 
3.5%
e 9169
 
2.4%
g 30
 
< 0.1%
p 23
 
< 0.1%
r 14
 
< 0.1%
o 13
 
< 0.1%
Other values (18) 70
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 376007
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w 279070
74.2%
y 30473
 
8.1%
n 30039
 
8.0%
u 14026
 
3.7%
k 13080
 
3.5%
e 9169
 
2.4%
g 30
 
< 0.1%
p 23
 
< 0.1%
r 14
 
< 0.1%
o 13
 
< 0.1%
Other values (18) 70
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 376007
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w 279070
74.2%
y 30473
 
8.1%
n 30039
 
8.0%
u 14026
 
3.7%
k 13080
 
3.5%
e 9169
 
2.4%
g 30
 
< 0.1%
p 23
 
< 0.1%
r 14
 
< 0.1%
o 13
 
< 0.1%
Other values (18) 70
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 376007
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w 279070
74.2%
y 30473
 
8.1%
n 30039
 
8.0%
u 14026
 
3.7%
k 13080
 
3.5%
e 9169
 
2.4%
g 30
 
< 0.1%
p 23
 
< 0.1%
r 14
 
< 0.1%
o 13
 
< 0.1%
Other values (18) 70
 
< 0.1%

has-ring
Categorical

IMBALANCE 

Distinct23
Distinct (%)< 0.1%
Missing24
Missing (%)< 0.1%
Memory size148.6 MiB
f
2368820 
t
747982 
r
 
16
h
 
13
c
 
11
Other values (18)
 
79

Length

Max length10
Median length1
Mean length1.0000038
Min length1

Characters and Unicode

Total characters3116933
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowf
2nd rowt
3rd rowf
4th rowf
5th rowf

Common Values

ValueCountFrequency (%)
f 2368820
76.0%
t 747982
 
24.0%
r 16
 
< 0.1%
h 13
 
< 0.1%
c 11
 
< 0.1%
l 11
 
< 0.1%
s 11
 
< 0.1%
p 11
 
< 0.1%
g 8
 
< 0.1%
z 6
 
< 0.1%
Other values (13) 32
 
< 0.1%
(Missing) 24
 
< 0.1%

Length

2024-09-04T12:12:37.187302image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f 2368821
76.0%
t 747982
 
24.0%
r 16
 
< 0.1%
h 13
 
< 0.1%
c 11
 
< 0.1%
l 11
 
< 0.1%
s 11
 
< 0.1%
p 11
 
< 0.1%
g 8
 
< 0.1%
z 6
 
< 0.1%
Other values (13) 32
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f 2368821
76.0%
t 747982
 
24.0%
r 17
 
< 0.1%
h 14
 
< 0.1%
s 12
 
< 0.1%
c 11
 
< 0.1%
l 11
 
< 0.1%
p 11
 
< 0.1%
g 9
 
< 0.1%
z 6
 
< 0.1%
Other values (17) 39
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3116933
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f 2368821
76.0%
t 747982
 
24.0%
r 17
 
< 0.1%
h 14
 
< 0.1%
s 12
 
< 0.1%
c 11
 
< 0.1%
l 11
 
< 0.1%
p 11
 
< 0.1%
g 9
 
< 0.1%
z 6
 
< 0.1%
Other values (17) 39
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3116933
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f 2368821
76.0%
t 747982
 
24.0%
r 17
 
< 0.1%
h 14
 
< 0.1%
s 12
 
< 0.1%
c 11
 
< 0.1%
l 11
 
< 0.1%
p 11
 
< 0.1%
g 9
 
< 0.1%
z 6
 
< 0.1%
Other values (17) 39
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3116933
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f 2368821
76.0%
t 747982
 
24.0%
r 17
 
< 0.1%
h 14
 
< 0.1%
s 12
 
< 0.1%
c 11
 
< 0.1%
l 11
 
< 0.1%
p 11
 
< 0.1%
g 9
 
< 0.1%
z 6
 
< 0.1%
Other values (17) 39
 
< 0.1%

ring-type
Categorical

IMBALANCE  MISSING 

Distinct40
Distinct (%)< 0.1%
Missing128880
Missing (%)4.1%
Memory size149.4 MiB
f
2477170 
e
 
120006
z
 
113780
l
 
73443
r
 
67909
Other values (35)
 
135757

Length

Max length20
Median length1
Mean length1.0000472
Min length1

Characters and Unicode

Total characters2988206
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st rowf
2nd rowz
3rd rowf
4th rowf
5th rowf

Common Values

ValueCountFrequency (%)
f 2477170
79.5%
e 120006
 
3.9%
z 113780
 
3.7%
l 73443
 
2.4%
r 67909
 
2.2%
p 67678
 
2.2%
g 63687
 
2.0%
m 3992
 
0.1%
t 98
 
< 0.1%
d 37
 
< 0.1%
Other values (30) 265
 
< 0.1%
(Missing) 128880
 
4.1%

Length

2024-09-04T12:12:37.338301image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f 2477173
82.9%
e 120006
 
4.0%
z 113780
 
3.8%
l 73443
 
2.5%
r 67909
 
2.3%
p 67678
 
2.3%
g 63687
 
2.1%
m 3992
 
0.1%
t 98
 
< 0.1%
d 37
 
< 0.1%
Other values (30) 265
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f 2477173
82.9%
e 120024
 
4.0%
z 113780
 
3.8%
l 73446
 
2.5%
r 67921
 
2.3%
p 67688
 
2.3%
g 63694
 
2.1%
m 3992
 
0.1%
t 106
 
< 0.1%
n 45
 
< 0.1%
Other values (24) 337
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2988206
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f 2477173
82.9%
e 120024
 
4.0%
z 113780
 
3.8%
l 73446
 
2.5%
r 67921
 
2.3%
p 67688
 
2.3%
g 63694
 
2.1%
m 3992
 
0.1%
t 106
 
< 0.1%
n 45
 
< 0.1%
Other values (24) 337
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2988206
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f 2477173
82.9%
e 120024
 
4.0%
z 113780
 
3.8%
l 73446
 
2.5%
r 67921
 
2.3%
p 67688
 
2.3%
g 63694
 
2.1%
m 3992
 
0.1%
t 106
 
< 0.1%
n 45
 
< 0.1%
Other values (24) 337
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2988206
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f 2477173
82.9%
e 120024
 
4.0%
z 113780
 
3.8%
l 73446
 
2.5%
r 67921
 
2.3%
p 67688
 
2.3%
g 63694
 
2.1%
m 3992
 
0.1%
t 106
 
< 0.1%
n 45
 
< 0.1%
Other values (24) 337
 
< 0.1%

spore-print-color
Categorical

IMBALANCE  MISSING 

Distinct32
Distinct (%)< 0.1%
Missing2849682
Missing (%)91.4%
Memory size164.9 MiB
k
107310 
p
68237 
w
50173 
n
22646 
r
 
7975
Other values (27)
10922 

Length

Max length10
Median length1
Mean length1.0001983
Min length1

Characters and Unicode

Total characters267316
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowk
2nd roww
3rd rowk
4th rowk
5th rowp

Common Values

ValueCountFrequency (%)
k 107310
 
3.4%
p 68237
 
2.2%
w 50173
 
1.6%
n 22646
 
0.7%
r 7975
 
0.3%
u 7256
 
0.2%
g 3492
 
0.1%
y 36
 
< 0.1%
s 21
 
< 0.1%
c 16
 
< 0.1%
Other values (22) 101
 
< 0.1%
(Missing) 2849682
91.4%

Length

2024-09-04T12:12:37.487223image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
k 107310
40.2%
p 68237
25.5%
w 50173
18.8%
n 22646
 
8.5%
r 7975
 
3.0%
u 7256
 
2.7%
g 3492
 
1.3%
y 36
 
< 0.1%
s 21
 
< 0.1%
c 16
 
< 0.1%
Other values (23) 103
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
k 107310
40.1%
p 68237
25.5%
w 50173
18.8%
n 22649
 
8.5%
r 7977
 
3.0%
u 7256
 
2.7%
g 3492
 
1.3%
y 36
 
< 0.1%
s 25
 
< 0.1%
c 19
 
< 0.1%
Other values (26) 142
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 267316
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
k 107310
40.1%
p 68237
25.5%
w 50173
18.8%
n 22649
 
8.5%
r 7977
 
3.0%
u 7256
 
2.7%
g 3492
 
1.3%
y 36
 
< 0.1%
s 25
 
< 0.1%
c 19
 
< 0.1%
Other values (26) 142
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 267316
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
k 107310
40.1%
p 68237
25.5%
w 50173
18.8%
n 22649
 
8.5%
r 7977
 
3.0%
u 7256
 
2.7%
g 3492
 
1.3%
y 36
 
< 0.1%
s 25
 
< 0.1%
c 19
 
< 0.1%
Other values (26) 142
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 267316
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
k 107310
40.1%
p 68237
25.5%
w 50173
18.8%
n 22649
 
8.5%
r 7977
 
3.0%
u 7256
 
2.7%
g 3492
 
1.3%
y 36
 
< 0.1%
s 25
 
< 0.1%
c 19
 
< 0.1%
Other values (26) 142
 
0.1%
Distinct52
Distinct (%)< 0.1%
Missing45
Missing (%)< 0.1%
Memory size148.6 MiB
2024-09-04T12:12:37.573671image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0000677
Min length1

Characters and Unicode

Total characters3117111
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)< 0.1%

Sample

1st rowd
2nd rowd
3rd rowl
4th rowd
5th rowg
ValueCountFrequency (%)
d 2177573
69.9%
g 454908
 
14.6%
l 171892
 
5.5%
m 150969
 
4.8%
h 120138
 
3.9%
w 18531
 
0.6%
p 17180
 
0.6%
u 5264
 
0.2%
e 55
 
< 0.1%
s 52
 
< 0.1%
Other values (41) 340
 
< 0.1%
2024-09-04T12:12:37.854636image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
d 2177576
69.9%
g 454910
 
14.6%
l 171900
 
5.5%
m 150970
 
4.8%
h 120143
 
3.9%
w 18531
 
0.6%
p 17190
 
0.6%
u 5265
 
0.2%
e 68
 
< 0.1%
s 65
 
< 0.1%
Other values (27) 493
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3117111
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
d 2177576
69.9%
g 454910
 
14.6%
l 171900
 
5.5%
m 150970
 
4.8%
h 120143
 
3.9%
w 18531
 
0.6%
p 17190
 
0.6%
u 5265
 
0.2%
e 68
 
< 0.1%
s 65
 
< 0.1%
Other values (27) 493
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3117111
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
d 2177576
69.9%
g 454910
 
14.6%
l 171900
 
5.5%
m 150970
 
4.8%
h 120143
 
3.9%
w 18531
 
0.6%
p 17190
 
0.6%
u 5265
 
0.2%
e 68
 
< 0.1%
s 65
 
< 0.1%
Other values (27) 493
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3117111
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
d 2177576
69.9%
g 454910
 
14.6%
l 171900
 
5.5%
m 150970
 
4.8%
h 120143
 
3.9%
w 18531
 
0.6%
p 17190
 
0.6%
u 5265
 
0.2%
e 68
 
< 0.1%
s 65
 
< 0.1%
Other values (27) 493
 
< 0.1%

season
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size148.6 MiB
a
1543321 
u
1153588 
w
278189 
s
 
141847

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3116945
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd roww
3rd roww
4th rowu
5th rowa

Common Values

ValueCountFrequency (%)
a 1543321
49.5%
u 1153588
37.0%
w 278189
 
8.9%
s 141847
 
4.6%

Length

2024-09-04T12:12:38.002616image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-04T12:12:38.118214image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
a 1543321
49.5%
u 1153588
37.0%
w 278189
 
8.9%
s 141847
 
4.6%

Most occurring characters

ValueCountFrequency (%)
a 1543321
49.5%
u 1153588
37.0%
w 278189
 
8.9%
s 141847
 
4.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3116945
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 1543321
49.5%
u 1153588
37.0%
w 278189
 
8.9%
s 141847
 
4.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3116945
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 1543321
49.5%
u 1153588
37.0%
w 278189
 
8.9%
s 141847
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3116945
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 1543321
49.5%
u 1153588
37.0%
w 278189
 
8.9%
s 141847
 
4.6%

Interactions

2024-09-04T12:11:59.076180image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:51.902206image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:54.476856image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:56.827597image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:59.675723image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:52.515753image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:55.077870image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:57.443908image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:12:00.258569image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:53.053283image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:55.677515image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:57.943524image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:12:00.775074image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:53.684581image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:56.266022image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-09-04T12:11:58.526652image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Correlations

2024-09-04T12:12:38.220202image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
cap-diameterclassdoes-bruise-or-bleedgill-spacinghas-ringidring-typeseasonspore-print-colorstem-heightstem-rootstem-widthveil-colorveil-type
cap-diameter1.0000.1580.1100.0640.0500.0000.0940.0910.2750.5120.3370.8830.1130.000
class0.1581.0000.0380.1400.0500.0000.1970.1490.4260.0730.5210.2180.4960.003
does-bruise-or-bleed0.1100.0381.0000.0350.0090.0000.0430.0920.0850.0440.1090.1060.1780.000
gill-spacing0.0640.1400.0351.0000.0480.0010.0440.1550.4050.0300.1970.0880.1150.167
has-ring0.0500.0500.0090.0481.0000.0000.1940.0230.1770.1050.0850.0790.1420.000
id0.0000.0000.0000.0010.0001.0000.0000.0000.0000.0000.0000.0000.0000.002
ring-type0.0940.1970.0430.0440.1940.0001.0000.0700.2610.2270.1420.1210.1800.072
season0.0910.1490.0920.1550.0230.0000.0701.0000.2130.0510.1470.0730.1470.000
spore-print-color0.2750.4260.0850.4050.1770.0000.2610.2131.0000.0950.3440.3600.2710.152
stem-height0.5120.0730.0440.0300.1050.0000.2270.0510.0951.0000.2460.4490.1950.006
stem-root0.3370.5210.1090.1970.0850.0000.1420.1470.3440.2461.0000.2370.3270.080
stem-width0.8830.2180.1060.0880.0790.0000.1210.0730.3600.4490.2371.0000.2110.010
veil-color0.1130.4960.1780.1150.1420.0000.1800.1470.2710.1950.3270.2111.0000.390
veil-type0.0000.0030.0000.1670.0000.0020.0720.0000.1520.0060.0800.0100.3901.000

Missing values

2024-09-04T12:12:02.873835image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.
2024-09-04T12:12:07.737682image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-09-04T12:12:27.242283image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

idclasscap-diametercap-shapecap-surfacecap-colordoes-bruise-or-bleedgill-attachmentgill-spacinggill-colorstem-heightstem-widthstem-rootstem-surfacestem-colorveil-typeveil-colorhas-ringring-typespore-print-colorhabitatseason
00e8.80fsufacw4.5115.39NaNNaNwNaNNaNffNaNda
11p4.51xhofacn4.796.48NaNyoNaNNaNtzNaNdw
22e6.94fsbfxcw6.859.93NaNsnNaNNaNffNaNlw
33e3.88fygfsNaNg4.166.53NaNNaNwNaNNaNffNaNdu
44e5.85xlwfdNaNw3.378.36NaNNaNwNaNNaNffNaNga
55p4.30xtnfscn5.918.20NaNNaNwNaNntzNaNda
66e9.65pywfeck19.0712.69NaNswNaNNaNteNaNgw
77p4.55xeefaNaNy8.319.77NaNNaNyNaNwtzNaNda
88p7.36fhefxdw5.7717.13bNaNwNaNNaNffNaNda
99e6.45xtnfadw7.1312.77NaNNaNeNaNNaNffNaNda
idclasscap-diametercap-shapecap-surfacecap-colordoes-bruise-or-bleedgill-attachmentgill-spacinggill-colorstem-heightstem-widthstem-rootstem-surfacestem-colorveil-typeveil-colorhas-ringring-typespore-print-colorhabitatseason
31169353116935p14.58xdnfpNaNp14.7835.76sywNaNNaNffNaNda
31169363116936p1.70xknfNaNNaNn4.771.61NaNNaNnNaNNaNffNaNdw
31169373116937p0.69xgofNaNNaNy3.510.73NaNNaNyNaNNaNffNaNdu
31169383116938p9.08stptdcp8.0714.70NaNNaNpNaNNaNtfNaNda
31169393116939p9.30oNaNeffff3.4225.38NaNgnNaNNaNffNaNdu
31169403116940e9.29fNaNntNaNNaNw12.1418.81bNaNwuwtgNaNdu
31169413116941e10.88sNaNwtdcp6.6526.97NaNNaNwNaNNaNffNaNdu
31169423116942p7.82xeefaNaNw9.5111.06NaNNaNyNaNwtzNaNda
31169433116943e9.45pinteNaNp9.1317.77NaNywNaNNaNtpNaNdu
31169443116944p3.20xsgfdcw2.827.79NaNNaNwNaNNaNffNaNgu